Artificial intelligence-assisted metastasis and prognosis model for patients with nodular melanoma

Objective The objective of this study was to identify the risk factors that influence metastasis and prognosis in patients with nodular melanoma (NM), as well as to develop and validate a prognostic model using artificial intelligence (AI) algorithms. Methods The Surveillance, Epidemiology, and End Results (SEER) database was queried for 4,727 patients with NM based on the inclusion/exclusion criteria. Their clinicopathological characteristics were retrospectively reviewed, and logistic regression analysis was utilized to identify risk factors for metastasis. This was followed by employing Multilayer Perceptron (MLP), Adaptive Boosting (AB), Bagging (BAG), logistic regression (LR), Gradient Boosting Machine (GBM), and eXtreme Gradient Boosting (XGB) algorithms to develop metastasis models. The performance of the six models was evaluated and compared, leading to the selection and visualization of the optimal model. Through integrating the prognostic factors of Cox regression analysis with the optimal models, the prognostic prediction model was constructed, validated, and assessed. Results Logistic regression analyses identified that marital status, gender, primary site, surgery, radiation, chemotherapy, system management, and N stage were all independent risk factors for NM metastasis. MLP emerged as the optimal model among the six models (AUC = 0.932, F1 = 0.855, Accuracy = 0.856, Sensitivity = 0.878), and the corresponding network calculator (https://shimunana-nm-distant-m-nm-m-distant-8z8k54.streamlit.app/) was developed. The following were examined as independent prognostic factors: MLP, age, marital status, sequence number, laterality, surgery, radiation, chemotherapy, system management, T stage, and N stage. System management and surgery emerged as protective factors (HR < 1). To predict 1-, 3-, and 5-year overall survival (OS), a nomogram was created. The validation results demonstrated that the model exhibited good discrimination and consistency, as well as high clinical usefulness. Conclusion The developed prediction model more effectively reflects the prognosis of patients with NM and differentiates between the risk level of patients, serving as a useful supplement to the classical American Joint Committee on Cancer (AJCC) staging system and offering a reference for clinically stratified individualized treatment and prognosis prediction. Furthermore, the model enables clinicians to quantify the risk of metastasis in NM patients, assess patient survival, and administer precise treatments.


Introduction
Cutaneous melanoma (CM), originating from cutaneous melanocytes, exhibits invasive growth and is distinguished by its high rates of metastasis and recurrence [1][2][3].Nodular melanoma (NM) comprises approximately 14% of all CM cases and displays a higher mortality rate than other subtypes [4][5][6], with research indicating that it can be fatal in over 40% of melanoma patients [7].
The early diagnosis and identification of metastases are crucial for improving the prognosis and reducing mortality in NM.However, NM exhibits rapid growth and early invasiveness, with infiltration rates estimated at up to 0.5 mm/month, offering a narrower window for early diagnosis compared to other melanomas [8].Moreover, NM deviates from the typical melanoma growth pattern, and its clinical presentation often lacks the classic features of melanoma, at times mimicking benign lesions that manifest as pink or mottled papules, thereby complicating early detection and diagnosis for both physicians and patients [9].However, current conventional screening methods offer limited assistance in the early diagnosis and identification of metastases in NM.NM can present a nonspecific pattern on dermoscopy and lack identifiable melanoma features, potentially evading clinical and dermoscopic detection [10].
Beyond early diagnosis and metastases identification, comprehending the principal factors influencing disease progression and effectively and precisely assessing patient prognosis are imperative in determining optimal treatment strategies for patients.Currently, the American Joint Committee on Cancer (AJCC) staging system stands as the most prevalent method for assessing the prognosis of patients with NM [11,12].However, the AJCC system primarily includes information on the the tumor's original site and distant metastases, omitting crucial details like patient-specific factors, additional tumor characteristics and treatment methods, thus limiting its predictive accuracy for CM patient prognosis [13,14].Prognostic nomograms forecast clinical outcome by integrating various prognostic variables into quantifiable values, presented visually, offering benefits over conventional TNM staging [15].Consequently, some researchers have suggested it as an alternative to the AJCC staging system, potentially establishing a new prognostic benchmark.
Artificial intelligence (AI), underpinned by machine learning (ML) technologies, is currently advancing in the medical field and beyond [16][17][18][19].AI can facilitate the modeling and prediction of medical data.Currently, research into the metastasis and prognosis of NM patients remains scant, with a notable absence of pertinent predictive models.It merits emphasis that a model that integrates ML algorithms for predict metastasis and prognosis in NM patients represents a pioneering endeavor.This study's objective is to identify patients with NM within the Surveillance, Epidemiology and End Result (SEER) database, to explore the prognostic factors associated with NM patients, and to integrate ML algorithms for predicting metastasis in NM patients, furthermore, to develop a nomogram for predicting the prognosis of NM patients based on these prognostic factors, and to validate and assess its efficacy.Additionally, a risk stratification system, derived from the nomogram scores, was created to classify NM patients into low-risk and high-risk categories.This approach will facilitate accurate and personalized prognostic assessment for NM patients, thereby aiding clinicians in tailoring treatment and follow-up strategies.

Patient screening
Clinical details for 4,727 NM patients were extracted from the SEER database, covering CM cases diagnosed between 2010 and 2015, in accordance with specified inclusion and exclusion criteria (Fig 1).Simultaneously, two independent collectors performed the extraction, with any disagreements resolved by a third arbitrator.SEER database information is anonymous, ensuring no breach of patient privacy.Furthermore, the SEER database is publicly accessible, negating the need for patient-informed consent.

Variables included in the study
Variables encompassed age, gender, race, marital status, sequence number, laterality, grade, primary site, mode of surgery at the primary site, AJCC 7th edition TNM staging, and radiotherapy information.X-tile software, utilizing Kaplan-Meier survival curves, classified patients into two age groups using with an optimal cutoff at 60 years, transforming continuous variables into categorical ones.Additionally, follow-up variables comprised survival months and vital status recodes (with study cutoff applied).The primary endpoint observed was overall survival (OS), defined as the duration from diagnosis to any-cause death by the end of follow-up.

Construction and validation of the metastasis prediction model
Independent risk factors for NM metastasis were identified through univariate and multivariate logistic regression analysis.To mitigate the impact of unbalanced data on the model construction, the SMOTE oversampling method was employed for preprocessing, with the oversampling data divided into a training set and a test set in a 7:3 ratio [20].Six ML algorithms-Multilayer Perceptron (MLP), Adaptive Boosting (AB), Bagging (BAG), logistic regression (LR), Gradient Boosting Machine (GBM) and eXtreme Gradient Boosting (XGB)were utilized to develop a prediction model for NM metastasis.The predictive performance of these models was evaluated using ten-fold cross-validation, radar plot and confusion matrix analysis, leading to the selection of the top-performing ML model for predictive model development.Feature importance in the optimal model was analyzed, and a network calculator was developed for model visualization.

Construction and validation of the prognostic model
Prognostic variables linked to OS in NM patients were identified through univariate Cox analysis, and variables with P < 0.05 were subsequently analyzed via multivariate Cox analysis to isolate independent prognostic factors.A nomogram was developed based on these identified independent prognostic factors.Nomogram performance was evaluated using receiver operating characteristic (ROC) curves, calibration plots, and decision curve analysis (DCA) to determine the nomogram's clinical utility through threshold probability net benefits.Kaplan-Meier curves were generated to visualize differences in OS prognostic factors among NM patients.Patients were classified into high-risk and low-risk groups based on risk scores, with corresponding risk score maps and heat maps created for visualization.

Statistical analysis
Statistical analyses were conducted using Python (version 3.8) and R software (version 4.0.2).Count data were presented as the number of cases and percentage (%), with the Chi-squared test employed for group comparisons.The ROC curve assessed the nomogram's discriminatory performance, the area under the curve (AUC) its predictive capacity for OS, the calibration plot its calibration, and the DCA curve its net benefit and clinical utility.The 'survivalROC' (version 1.0.3.1),'rms' (version 6.3-0), 'survival' (version 3.4-0), 'foreign' (version 0.8-83) R packages, among others, were utilized for the construction and validation of the metastasis and prognostic model.The SHAP values for model interpretation were derived using the Python SHAP package.A P-value <0.05 was deemed to indicate statistical significance.

Patient characteristics
Data on melanoma patients from the SEER database spanning 2004 to 2017, screened according to inclusion and exclusion criteria, resulted in 4,727 eligible NM patients, comprising 2,903 (61.4%) males and 1,824 (38.6%) females, with 97.4% identified as white ethnicity (Table 1).Patients were categorized into two groups based on the occurrence of metastasis, and a chi-square test was applied to both groups.The findings indicated that 214 out of 4,727 patients exhibited metastases, with statistically significant differences (p < 0.05) observed across eight variables-gender, primary site, surgery, radiation, chemotherapy, system management, T stage and N stage-between metastatic and non-metastatic groups (Table 1).
2.2 Selection and validation for the optimal machine learning model.Six ML algorithms-MLP, AB, BAG, LR, GBM and XGB-were utilized to develop prediction models for NM metastasis.Ten-fold cross-validation, used to compare the models' predictive   i Surgery (No surgery of primary site = 0, Local tumor destruction = 1, Biopsy of primary tumor followed by a gross excision of the lesion, does not have to be done under the same anesthesia = 2, Wide excision or re-excision of lesion or local amputation with margins more than 1 cm.Margins MUST be microscopically negative = 3, Other/Unknown = 4).j Radiation (None/Unknown = 0, Yes = 1).
In conclusion, MLP was ultimately selected as the predictive model for NM metastasis, showcasing its superior performance across various metrics.
2.3 Feature importance and visualization of the MLP model.Feature importance of the MLP model was interpreted using the SHapley Additive exPlanation (SHAP).The marginal contribution of the ten features of the model output was interpreted for all samples, combining the feature importance and feature effects with a summary plot.The results demonstrated that the sample distribution for the features of laterality, and surgery was more dispersed, with a wide range of Shapley value, indicating a significant impact of laterality and surgery.In contrast, the distribution of chemotherapy was centered around SHAP = 0, indicating the least impact (Fig 4A).Subsequently, a force plot was created to illustrate the feature interpretation of predictions for individual sample (Fig 4B and 4C).
Concurrently, a web-based calculator was developed to visualize the model and support the clinical application of metastasis prediction in NM patients (https://shimunana-nm-distantm-nm-m-distant-8z8k54.streamlit.app/).

Univariate and multivariate Cox regression.
In conjunction with the established MLP metastasis model, a prognostic model for NM was further developed.Univariate and multivariate Cox regression analyses were employed to identify prognostic variables associated with survival in NM patients.Univariate Cox analysis of all variables in the sample revealed that factors such as MLP, age, marital, race, sequence number, gender, laterality, primary site, surgery, radiation, chemotherapy, system treatment, T stage, and N stage were significantly correlated with the prognosis of NM patients (p < 0.05) (Table 4).
Subsequent multivariate Cox regression analysis identified MLP (HR = 1.61, 95% CI = 1.267-2.046,p < 0.001), age (>60, HR = 2.39, 95% CI = 2.091-2.732,p < 0.001), marital status (unmarried, HR = 1.499, 95% CI = 1.334-1.684,p < 0.001), sequence number (more,    4).Forest plots of univariate and multivariate Cox regression analyses revealed that system management and surgery acted as protective factors for the prognosis of NM patients (HR < 1), whereas age, chemotherapy, laterality, marital status, MLP, N stage, T stage, radiation and sequence number emerged as risk factors (HR > 1) (Fig 5)      Risk score for all samples were categorized from low to high, with the median value (risk score = 2) serving as the cutoff to classify patients into high and low-risk groups (Fig 9A).Sequentially, the risk score order from low to high revealed each patient's survival time, with a higher mortality rate in the high-risk group compared to the low-risk group, further validating

Discussion
NM exhibits greater aggressiveness and a higher risk of metastasis compared to other melanoma subtypes [21,22].Despite representing only 14% of CM [4], it constitutes a significant portion of melanomas that ultimately prove fatal [23].Early detection and accurate diagnosis are pivotal in enhancing the prognoses.Upon diagnosis, personalized risk stratification and prognostication can extend survival by guiding the choice of optimal treatment and follow-up strategies.While the AJCC staging system is the predominant method for tumor prognosis assessment, its application to CM patient evaluation has notable limitations [24,25].The nomogram precisely evaluates individual survival probabilities at specified times, is userfriendly, and presents clear benefits over the AJCC staging system [26][27][28].This study developed a nomogram and risk stratification system to predicting OS in NM patients, leveraging the SEER database and ML algorithms.Model validation was conducted through ROC curves, This study identified marital status, gender (female), primary site (skin of upper limb and shoulder), surgery, radiation, chemotherapy, system management, and N stage as independent risk factors for NM metastasis.Among the six ML models constructed, the MLP was selected as the predictive model for NM metastasis.Univariate and multivariate Cox analyses were conducted on the included variables to identify prognostic factors affecting OS in NM patients.Patients over 60, multiple tumors, unmarried, not undergoing radical surgery, in late stage, with lymph node metastases, or receiving chemotherapy, systemic, and radiation therapy exhibited poorer prognoses.Generally, advanced age serves as an indicator of poorer prognosis across all histological subtypes of melanoma [29].In this study, NM patients' ages were categorized into two groups (� 60 years and > 60 years) using X-tile software.The findings indicated a deteriorating prognosis with advancing age, aligning with previous research [30].This trend could be attributed to increased underlying disease, diminished physical function, and higher tumor burden in older patients.Hence, beyond tumor treatment, addressing underlying disease in elderly NM patients requires augmented attention.Within our patient cohort, the largest number of patients, 61.4%, were male, mirroring the recognized prevalence among NM patients [31].Contrary to earlier findings, gender did not emerge as a risk factor for OS in NM patients in this study; however, it was identified as an independent risk factor for metastasis.Our study revealed that unmarried, older patients experienced worse prognoses.This observation is speculated to be linked to the patients' financial circumstances and access to adequate care.
Our study identified patients with T stage T3 (depth of 2mm or more) and T4 (depth of 4mm or more) as having a poor prognosis.According to the nomogram, the T stage score holds the greatest weight for OS and is considered the most significant predictor.Distinguished from other histological subtypes, NM is characterized by a rapid vertical growth phase of invasive melanocytes, fewer radial growth phases, and absent adjacent intraepidermal spread.This vertical growth tendency leads to increased Breslow thickness in NM [32].Studies indicate that 40% to 50% of melanomas with a Breslow thickness exceeding 2mm belong to the nodular subtype [23].Breslow thickness serves as a crucial risk factor for NM patients [30].This finding aligns with our results.
Lymph node metastases from NM emerged as a crucial risk factor for significantly deteriorating OS.A positive sentinel lymph node biopsy represents a significant risk factor for the substantial decline in OS [30].Furthermore, our study revealed that NM patients undergoing chemotherapy, systemic therapy, and radiation therapy exhibited worse prognosis, potentially due to advanced stages and distant metastases.
This study detailed the primary NM occurrence sites for prognostic analysis, including the trunk, upper limbs and shoulders, lower limbs and buttocks, external ear, eyelids, vulva, lips, among others.. Within the patient cohort, the trunk was the most common site of NM, comprising 29.6% of the cases.Regarding OS, patients with NM in the vulva had the poorest prognosis.The reasons for prognostic differences in CM patients at various sites are not fully explained, but are thought to involve complex factors like regional lymphatic drainage and Surgery remains the primary treatment modality for CM [33,34].Our study identified surgery as a critical prognostic factor for OS in NM patients.CM Patients undergoing surgical intervention exhibited significantly improved prognoses compared to those who did not.In early stages, most NM appears symmetrical and round, lacking specificity and potentially eluding the "ABCD" rule (asymmetry, irregular borders, color variation and diameter greater than 6 mm) [8], challenging identification even with dermoscopy.Despite the use of techniques like reflectance confocal microscopy (RCM) and optical coherence tomography (OCT) to partially improve NM detection rate, their diagnostic accuracy remains lower compared to other melanoma subtypes.To mitigate the severe consequences of NM underdiagnosis, Moscarella E et al's study recommends the immediate removal of any nodular lesion that cannot be classified as benign [35].While this seems a viable approach to enhancing early NM diagnosis, caution is advised in areas of functional and significant cosmetic importance to prevent excessive excision that may surpass the benefits.
While the AJCC staging system serves as a key reference for NM prognostic assessment, it includes a limited array of prognostic parameters, thus offering a restricted prognostic evaluation [36][37][38].Nomogram prediction models has bridged this gap.Nevertheless, prognostic models specifically for NM patients are scarce.Based on this the current study developed a nomogram to predict OS in NM patients.Furthermore, most malignancies are classified into clinical stages based on varying risk levels in patients.Based on the risk level, appropriate adjuvant treatments or follow-up strategies are selected.In this study, we created a risk stratification for NM patients based on their nomogram scores.Our risk stratification system, containing comprehensive prognostic information, enables the identification of NM patients within high-risk and low-risk categories, accurately differentiating between various NM risk levels risk levels.This risk stratification not only underpins further prognosis but also facilitates the creation of tailored treatment plans and follow-up strategies based on patients risk levels.
Owing to database and analysis tools limitations, this study faces shortcomings that require future supplementation and compensation.First, the absence of detailed data on influencing factors, such as LDH levels, gene mutation statuses, and targeted drugs use undeniably reduces the model's predictive accuracy.Second, the data sourced from the SEER database lack external validation, necessitating further verification of their broad applicability.Lastly, despite being based on a multicenter, large-sample database, this retrospective study harbors inherent flaws, such as the exclusion of CM patients with incomplete data, introducing selection bias.The next step include enhancing data collection on prognostic indicators for NM patients and and conducting external validation of the predictive model.A multicenter prospective study of NM will be pursued when feasible, aiming to refine prognostic assessment accuracy and support clinical decision-making.Furthermore, integrating the model with more sophisticated algorithmic approaches in future work is envisioned.

Conclusions
This study explored risk and prognostic factors related to metastasis in NM patients, validating and assessing a ML-enhanced nomogram for predicting OS.A quantitative prognosis assessment for NM patients was achieved, offering guidance for clinical decision-making.The model is low-cost, non-invasive, and easy to implement, useful for quantifying metastasis risk in NM patients and assessing survival.It enables early identification of high-risk patients, personalized and precise treatment, and the development of follow-up strategies.The model, of course, requires further real-world validation.

Fig 2 .
Fig 2. (A) Ten-fold cross-validation of the six machine learning models.(B) The ROC curves of the six machine learning models.1-Specificity indicates the false positive rate of the model and Sensitivity indicates the true positive rate.(C) The ROC curves of the MLP.(D) The radar plots of the six machine learning models.AUC, the area under the curve; ROC, Receiver Operator Characteristic; MLP, Multilayer Perceptron; AB, Adaptive Boosting; BAG, Bagging; LR, Logistic regression; GBM, Gradient Boosting Machine; XGB, eXtreme Gradient Boosting.https://doi.org/10.1371/journal.pone.0305468.g002 .

3 . 2
The prognostic nomogram.The development of a nomogram for predicting the 1-, 3-and 5-year OS in NM patients, utilizing the independent prognostic factors identified, enhances the readability of the prognostic model and offers personalized insights, aiding clinicians in evaluating patient survival and prognosis (Fig 6).3.3 Validation and evaluation of the prognostic model.The calibration plot was employed to assess model consistency, with the 1-, 3-and 5-year OS plots closely aligning with the standard curve.This alignment indicates a strong correlation between the prognostic

Fig 4 .
Fig 4. Explanation of feature importance for MLP model using SHAP.(A) SHAP summary plot.X-axis is determined by each Shapley value.(B)(C) SHAP force plot.Features predicted to increase the output are depicted in red, while those predicted to decrease the output are in blue.The length of an arrow correlates directly with the magnitude of the feature's impact on the output.MLP, Multilayer Perceptron; SHAP, SHapley Additive exPlanation.https://doi.org/10.1371/journal.pone.0305468.g004 model and reality, demonstrating minimal discrepancy between predicted and actual prognoses (Fig 7A and 7C).The DCA curve was utilized to assess the model's applicability, revealing that our NM prognostic nomogram exhibited significant net gain (Fig 7D), highlighting its strong clinical utility.The ROC curves evaluated the discrimination capability of the 1-year, 3-year and 5-year survival nomograms, with AUC values of 0.761, 0.768 and 0.77 respectively (Fig 7E).The Kaplan-Meier curve visualized differences in OS among NM patients' prognostic factors.It indicated that patients over 60, with multiple tumors, unmarried, not undergoing radical surgery, with primary site in the vulva, in late stage, with lymph node metastases, or undergoing chemotherapy, systemic, and radiation therapy, had poorer prognoses (Fig 8).

Fig 6 .
Fig 6.Nomogram of 1-year, 3-year, and 5-year OS for patients with nodular melanoma.Prognostic variables for a specific patient are aligned with the top "Points", generating individual scores for each variables.Subsequently, the sum of these scores is aligned with the bottom "Total Points" to estimate the patient's overall survival probability.OS, overall survival; MLP, multilayer perceptron.https://doi.org/10.1371/journal.pone.0305468.g006

Fig 9 .
Fig 9. (A) Risk score grouping.(B) Risk scores and patient survival time.(C) Risk heat map.The green line represents the low-risk stratum and the red line represents the high-risk stratum.https://doi.org/10.1371/journal.pone.0305468.g009

Table 1 . Patient baseline information.
performance, revealed that all six models achieved AUC values above 0.8.MLP ranked as the most accurate, followed by XGB, with the conventional LR model as the least effective (Fig2A).Model discrimination was assessed using the AUC of the ROC curves.The ROC analysis for the six ML models indicated strong performance and discrimination across all, with MLP outperforming the rest (Fig2B).The MLP prediction model's average AUC under the ROC curve was 0.93±0.007,demonstratingahigh level of discrimination (Fig2C).